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ABSTRACT 

This  paper  addresses  the  design  of  distributed  cogni¬ 
tive  medium  access  control  (MAC)  protocols  for  oppor¬ 
tunistic  spectrum  access  (OSA)  under  an  energy  constraint 
on  the  secondary  users.  The  objective  is  to  maximize 
the  expected  number  of  information  bits  that  can  be 
delivered  by  a  secondary  user  during  its  battery  lifetime 
without  causing  interference  to  primary  users.  By  ab¬ 
sorbing  the  residual  energy  level  of  the  secondary  user 
into  the  state  space,  we  formulate  the  energy-constrained 
OSA  problem  as  an  unconstrained  partially  observable 
Markov  decision  process  (POMDP)  and  obtain  the  op¬ 
timal  spectrum  sensing  and  access  policy.  We  analyze 
and  reduce  the  computational  complexity  of  the  optimal 
policy.  We  also  propose  a  suboptimal  solution  to  energy- 
constrained  OSA,  whose  computational  complexity  can  be 
systematically  traded  off  with  its  performance.  Numerical 
examples  are  provided  to  study  the  impact  of  spectrum 
occupancy  dynamics,  channel  fading  statistics,  and  energy 
consumption  characteristics  of  the  secondary  user  on  the 
optimal  sensing  and  access  decisions. 

1.  INTRODUCTION 

Network  centric  warfare  depends  on  assured  global 
wireless  networks  that  can  be  established  rapidly,  any¬ 
where  and  anytime.  One  of  the  most  difficult  challenges 
is  the  increasingly  crowded  spectrum  under  the  existing 
static  allotment  policy  on  the  one  hand,  and  the  increasing 
demand  for  bandwidth  to  support  more  services.  The 
number  of  potentially  netted  devices  is  large  (and  increas¬ 
ing)  and  their  needs  are  so  dynamic  that  the  overhead 
in  distributing  and  updating  topology  information  can  be 
prohibitive.  Indeed,  the  spectrum  allocation  problem  is 
extremely  difficult  for  multi-nation  coalition  operations, 
as  the  composition  of  the  operating  units  can  change 
rapidly.  A  further  impediment  to  rapid  deployment  is 
that  spectrum  allotment  policy  varies  from  country  to 
country  (sometimes  from  service  to  service),  often  with 
mandated  a  priori  assignment  of  spectrum  to  services 
before  deployment.  This  should  be  contrasted  with  the 
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commercial  wired  world,  where  infrastructure  provides 
fiber  with  virtually  unlimited  bandwidth,  and  efficient 
utilization  is  largely  dictated  by  technology  and  cost 
(routers,  transmitters,  switches). 

Opportunistic  spectrum  access  (OSA),  envisioned  by 
the  DARPA  XG  program  (DARPA,  2005),  aims  to  exploit 
the  instantaneous  spectrum  availability  using  sophisticated 
signal  processing  and  networking  techniques.  The  idea  is 
to  identify  available  channel  resources  (frequency,  space- 
time,  and  codes)  and  communicate  opportunistically  in  a 
manner  that  limits  the  level  of  interference  perceived  by 
primary  users.  Such  strategies  are  particularly  relevant  to 
small  units  penetrating  deep  in  an  unknown  territory. 

Related  Work  The  design  of  medium  access  control 
(MAC)  for  OSA  has  been  received  great  attention  re¬ 
cently  (DySPAN,  2005;  CrownCom,  2006).  The  majority 
of  the  literature  (Mangold  et  al.,  2004;  Larcher  et  al., 
2004,  Papadimitratos  et  ak,  2005)  focus  on  a  network  of 
geographically  distributed  secondary  users,  each  affected 
by  a  different  set  of  primary  users  whose  spectrum  access 
activities  are  static  or  slowly  varying  in  time.  The  design 
objective  is  to  allocate  these  spatially  varying  spectrum 
opportunities  among  secondary  users  so  that  the  network- 
level  spectrum  efficiency  is  maximized  subject  to  some 
regulatory  constraint  on  interference  to  primary  users. 

In  (Zhao  et  ak,  2005a;  Zhao  et  ak  2006),  the  exploita¬ 
tion  of  temporal  spectrum  opportunities  resulting  from  the 
bursty  traffic  of  primary  users  has  been  studied.  Within 
the  framework  of  partially  observable  Markov  decision 
process  (POMDP),  the  optimal  distributed  cognitive  MAC 
protocol  that  allows  secondary  users  to  independently 
search  for  and  exploit  instantaneous  spectrum  opportu¬ 
nities  has  been  developed  in  (Zhao  et  ak,  2005a).  This 
protocol  consists  of  a  sensing  strategy  that  determines 
which  channels  in  the  spectrum  to  sense  based  on  spec¬ 
trum  occupancy  dynamics  and  an  access  strategy  that 
determines  whether  to  transmit  over  the  sensed  channels 
based  on  sensing  outcomes.  The  energy  constraint  of 
secondary  users  is,  however,  not  taken  into  account  in 
(Zhao  et  ak,  2005a;  Zhao  et  ak  2006). 

Contribution  The  ability  to  dynamically  and  efficiently 
share  the  spectrum  depends  upon  how  much  information 
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is  available  about  spectrum  occupancy,  how  accurate  the 
information  is,  and  how  quickly  and  fully  this  information 
can  be  exchanged  among  secondary  users.  Given  that 
sensing  and  estimation  outcomes  cannot  be  error  free,  we 
do  not  insist  on  binary  (black  and  white)  estimates  of 
spectrum  occupancy,  rather  we  estimate  occupancy  likeli¬ 
hoods.  Given  that  spectrum  usage  will  be  dynamic  in  the 
scenarios  of  interest,  we  do  not  insist  on  complete  channel 
occupancy  information.  But  spectrum  usage  maps  will  be 
local  (e.g.,  local  interference;  the  hidden  node  problem, 
etc.),  the  cost  of  sensing  channels  and  of  exchanging 
information  can  be  high,  and  the  variation  in  the  time- 
scales  involved  (coherence  time  for  channel  occupancy, 
vs.  time  required  to  transmit,  receive  and  exploit  this 
information)  can  be  large.  Hence,  in  this  paper  we  focus 
on  pairs  of  secondary  users  arriving  at  consensus.  It  is 
worth  stressing  that  we  do  not  assume  the  existence  of  a 
pre-determined  coordination  channel,  since  this  is  at  the 
heart  of  the  problem  that  DSA  must  solve.  The  incor¬ 
poration  of  energy  constraint  can  significantly  complicate 
the  cognitive  MAC  design.  Under  an  energy  constraint, 
sensing  decisions  should  be  made  based  on  not  only 
spectrum  occupancy  dynamics  but  also  channel  fading 
statistics,  and  access  decisions  should  take  into  account 
not  only  the  availability  but  also  the  fading  condition  of 
the  sensed  channels.  This  makes  the  optimal  sensing  and 
access  strategies  opportunistic  in  both  spectrum  and  time. 
Even  the  residual  energy  of  the  secondary  user  will  play  an 
important  role  in  decision-making.  For  example,  when  the 
battery  is  depleting,  should  the  user  wait  for  increasingly 
better  channel  realizations  for  transmission  or  should  it 
lower  the  requirement  on  channel  given  that  sensing  also 
costs  energy?  Clearly,  such  decisions  depend  on  the  energy 
consumption  characteristics  of  secondary  users. 

As  a  starting  point  to  a  detailed  study  of  energy- 
constrained  OSA,  this  paper  establishes  the  fundamental 
limit  on  the  expected  number  of  information  bits  that 
can  be  delivered  by  a  secondary  user  during  its  battery 
lifetime.  By  absorbing  the  residual  energy  level  of  the 
secondary  user  into  the  state  space,  we  show  that  the 
energy-constrained  OSA  problem  can  be  formulated  as  an 
unconstrained  POMDP.  Based  on  the  theory  of  POMDP, 
we  obtain  the  optimal  sensing  and  access  policy  which  not 
only  provides  a  performance  benchmark  but  also  enables 
us  to  study  the  impact  of  spectrum  occupancy  dynamics, 
channel  fading  statistics,  and  energy  consumption  char¬ 
acteristics  of  the  secondary  user  on  the  optimal  sensing 
and  access  decisions.  However,  our  complexity  analysis 
indicates  that  the  optimal  policy  is  computationally  ex¬ 
pensive.  We  therefore  exploit  the  underlying  structure  of 
the  problem  to  reduce  the  computational  complexity  of 
the  optimal  policy.  We  also  provide  a  suboptimal  solution 
whose  computational  complexity  can  be  systematically 
traded  off  with  its  performance.  Referred  to  as  the  greedy- 


w  strategy,  this  approach  maximizes  the  throughput  of  the 
secondary  user  in  a  fixed  time  window  of  w  slots.  Simula¬ 
tion  result  shows  that  as  the  window  size  w  increases,  the 
performance  of  the  greedy-w  strategy  quickly  approaches 
the  optimal  performance. 

Army  Relevance  The  Future  Combat  System  (FCS) 
(  http://www.army.mil/fcs/index.html  ) 
relies  on  technologies  that  enable  a  fully  mobile,  fully 
communicating,  agile,  situation-aware,  and  survivable 
lightweight  Army  force  with  internetted  C4ISR  systems. 
The  FCS  Brigade  Combat  Team  is  a  network  of  soldiers, 
interoperating  with  other  networks:  unattended  ground 
sensors,  small  unmanned  vehicle  systems,  non-line-of- 
sight  weapons  systems,  UAVs,  possibly  armed  robots, 
etc.  The  number  of  radio  units  in  the  soldier  system 
itself  is  expected  to  increase.  An  enabling  technology 
is  jointly  designed  physical  (PHY)  and  MAC  protocols 
that  can  support  sensing,  detection,  and  exploitation  of 
spectrum  opportunities,  with  strict  constraints  on  the 
effect  on  primary  users.  To  this  end,  an  energy-efficient, 
spectrum-adaptive  medium  access  approach  will  have  a 
significant  impact  on  achieving  agile  and  situation-aware 
communications  for  lightweight  army  forces.  Frequency 
allocation  has  been  a  manual  task  for  well  over  a  century 
now,  and  thus  cannot  adapt  to  battlefield  tempo.  Thus 
the  approaches  outlined  here  provide  a  framework  for 
the  automatic  management  of  spectrum,  and  will  be 
critical  in  dynamic  coalition  operations,  as  joint  forces 
come  together  and  separate.  They  could  also  be  useful  in 
peacekeeping  scenarios  where  military,  other  government, 
and  NGO  networks  must  co-exist.  Multi-channel  systems 
such  as  JTRS  could  also  benefit  from  the  adaptive 
multi-channel  cognitive  MAC  reported  in  this  paper. 

2.  PROBLEM  STATEMENT 

We  consider  a  spectrum  consisting  of  N  channels 
(e.g.,  frequency  bands),  each  with  bandwidth  Bn  (n  = 
1,  •  •  •  ,  N).  The  spectrum  is  licensed  to  a  primary  network 
whose  users  communicate  according  to  a  synchronous  slot 
structure.  Let  Sn  €  {0  (occupied),  1  (idle)}  denote  the 
availability  of  channel  n  in  a  slot.  We  assume  that  the 
spectrum  occupancy  S  =  [Si, . . . ,  Sn]  &  {0, 1}A  follows 
a  discrete  Markov  process  with  2N  states. 

Consider  an  ad  hoc  secondary  network  whose  users 
independently  seek  instantaneous  spectrum  opportunities 
in  these  N  channels.  Each  secondary  user  is  powered  by 
a  battery  with  initial  energy  £q.  At  the  beginning  of  each 
slot,  a  secondary  user  with  data  to  transmit  chooses  at 
most  M  (1  <  M  <  N )  channels  to  sense  and  then 
decides  whether  to  access  these  channels  according  to  the 
sensing  outcomes.  Our  goal  is  to  determine  the  sensing 
and  access  decisions  sequentially  in  each  slot  so  as  to 
maximize  the  total  expected  number  of  information  bits 
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that  can  be  delivered  by  a  secondary  user  during  its  battery 
lifetime.  For  ease  of  presentation,  we  assume  M  =  1.  Our 
results  can  be  generalized  to  M  >  1. 

2.1  Protocol  Structure 

Since  a  channel  only  presents  an  opportunity  to  a  pair 
of  secondary  users  if  it  is  available  at  both  the  transmitter 
and  the  receiver,  spectrum  opportunities  need  to  be  identi¬ 
fied  jointly  by  the  transmitter  and  the  receiver  (Zhao  et  al., 
2005b).  Below,  we  briefly  describe  the  protocol  structure. 

Suppose  that  the  transmitter  and  the  receiver  have 
tuned  to  the  same  channel  after  the  initial  handshake 
as  described  in  (Zhao  et  al.,  2005b).  At  the  beginning 
of  a  slot,  the  transmitter  and  the  receiver  hop  to  the 
same  channel1.  If  the  channel  is  sensed  to  be  available, 
the  transmitter  generates  a  random  backoff  time.  If  the 
channel  remains  idle  when  its  backoff  time  expires,  it 
transmits  a  short  request-to-send  (RTS)  message  to  the 
receiver,  indicating  that  the  channel  is  available  at  the 
transmitter.  Upon  receiving  the  RTS,  the  receiver  esti¬ 
mates  the  channel  fading  condition  using  the  RTS,  and 
then  replies  with  a  clear-to-send  (CTS)  message  if  the 
channel  is  also  available  at  the  receiver.  The  receiver  also 
informs  the  transmitter  of  the  current  fading  condition  by 
piggybacking  the  estimated  channel  state  to  the  CTS.  After 
a  successful  exchange  of  RTS-CTS,  the  transmitter  and  the 
receiver  can  communicate  over  this  channel.  At  the  end  of 
this  slot,  the  receiver  acknowledges  every  successful  data 
transmission.  Note  that  at  the  beginning  of  each  slot,  the 
transmitter  and  the  receiver  can  also  choose  not  to  hop  to 
any  channel  and  turn  to  sleep  mode  until  the  beginning  of 
the  next  slot. 

2.2  Energy  Model 

We  assume  that  channel  between  the  secondary  user 
and  its  destination  follows  a  block  fading  model,  i.e., 
the  channel  gain  in  a  slot  is  a  random  variable  (RV), 
identically  and  independently  distributed  (i.i.d.)  across 
slots  but  not  necessarily  i.i.d.  across  channels. 

Let  Etx(n)  denote  the  energy  consumed  in  transmit¬ 
ting  over  channel  n  in  a  slot.  Note  that  £}x  (n)  is  a  RV 
depending  on  the  current  fading  condition  of  channel  n. 
In  general,  the  better  the  channel  condition,  the  lower 
the  required  transmission  energy.  Let  L  be  the  number  of 
power  levels  at  which  the  secondary  user  can  transmit  and 
£k  the  energy  consumed  in  transmitting  at  the  k- th  power 
level  in  a  slot.  The  transmission  energy  consumption 
E)x(n)  thus  has  realizations  restricted  to  a  finite  set  £}x 
given  by 

Etx(n)  e£tx  =  {£fc}fc=o.  (1) 

1  Note  that  the  protocols  developed  in  this  paper  can  ensure  transceiver 
synchronization  in  a  distributed  fashion.  See  details  in  Sec  3.3 


where  0  <  c-|  <...<£/,<  oo  and  eo  =  0  indicates  that 
the  secondary  user  does  not  transmit.  We  also  consider  the 
energy  es  consumed  by  the  secondary  user  in  sensing  a 
channel  and  the  energy  ep  consumed  in  its  sleeping  mode. 

Let  E  denote  the  residual  energy  level  of  a  secondary 
user  at  the  beginning  of  a  slot.  Note  that  E  is  a  RV 
determined  by  the  channel  conditions  and  the  sensing  and 
access  decisions  in  all  previous  slots.  Thus,  E  belongs  to 
finite  set  £r  given  by 

L 

E  G  £r  —  {e  •  e  —  £q  ^  A  S}£)  cep, 

k—0  ^  ^ 

e  >  0,  c,  cfc  >  0,  c,  cfe  G  Z}  U  {0}, 

where  Cfc  is  the  number  of  slots  when  the  secondary  user 
chooses  to  sense  a  channel  and  then  transmit  over  it  at  the 
/c-th  power  level  and  c  is  the  number  of  slots  when  the 
secondary  user  turns  to  sleeping  mode. 

3.  OPTIMAL  ENERGY-CONSTRAINED  OSA 

The  energy-constrained  OSA  problem  can  be  formu¬ 
lated  as  a  constrained  POMDR  which  is  usually  more 
difficult  to  solve  than  an  unconstrained  one.  By  absorbing 
the  residual  energy  level  of  the  secondary  user  into  the 
state  space,  we  reduce  a  constrained  POMDP  to  an 
unconstrained  one.  Based  on  the  theory  of  POMDP,  we 
obtain  the  optimal  spectrum  sensing  and  access  policy. 

3.1  An  Unconstrained  POMDP  Formulation 

State  Space  In  each  slot,  the  network  state  is  character¬ 
ized  by  the  current  spectrum  occupancy  state  S  G  {0, 1}W 
and  the  residual  energy  E  £  £r  of  the  secondary  user 
at  the  beginning  of  this  slot.  The  state  space  S  of  the 
POMDP  can  be  defined  as 

(S ,E)  £  S  =  {(s,  e)  :  s  G  {0, 1}^, e  G  £r}.  (3) 

Action  Space  After  the  state  transition  of  spectrum 
occupancy  at  the  beginning  of  each  slot,  the  secondary 
user  can  either  choose  a  channel  a  G  {1, . . . ,  iV}  to  sense 
or  turn  to  sleep  (a  =  0).  If  the  secondary  user  chooses 
channel  a  to  sense,  then  it  will  obtain  a  sensing  outcome 
Oa  G  {0,1,..., L}  which  reflects  the  occupancy  state 
and  the  fading  condition  of  the  chosen  channel:  0a  =  0 
indicates  that  channel  a  is  busy  (i.e.,  Sa  =  0)  and  0a  =  k 
( k  =  1, , . . ,  L)  indicates  that  channel  a  is  idle  (i.e., 
Sa  =  1)  and  the  fading  condition  requires  the  secondary 
user  to  transmit  at  the  fc-th  power  level  (i.e.,  £tx(a)  =  £&). 
Given  sensing  outcome  0a,  the  secondary  user  decides 
whether  to  transmit  over  the  chosen  channel.  Let  T G 
{0  no  access,  1  access}  (k  =  0, . . . ,  L)  denote  the  access 
decision  under  sensing  outcome  0Q  =  k.  Since  we  have 
assumed  perfect  spectrum  sensing,  the  access  decision 


3 


under  0a  =  0  (busy)  is  simple:  <f>a(0)  =  0  (no  access). 

In  this  case,  secondary  users  will  not  collide  with  primary 
users. 

The  action  space  A  consists  of  all  sensing  decisions 
a  and  access  decisions  <Fa  =  [<ba(l), . . . ,  <f>a(L)]: 

(a,  *„)  €  A  =  {(0,  [0, . . . ,  0])}  U  {(a,  </>)  :  a  e  {1, . . . ,  TV}, 
<£=  0(1), ■■■,#£)]  e{o,i}L}.  (4) 

Note  that  the  access  decision  Tv,  associated  with  sensing 
action  a  =  0  (sleeping  mode)  is  determined  by  <f>o(fc)  =  0 
for  all  1  <  k  <  L. 

Network  State  Transition  At  the  beginning  of  each  slot, 
the  spectrum  occupancy  state  S  transits  independently  of 
the  residual  energy  E  according  to  transition  probabili¬ 
ties  {ps^},  where  ps  s/  denotes  the  probability  that  the 
spectrum  occupancy  state  transits  from  s  £  {0,1}^  to 
s'  £  {0, 1}^.  In  this  paper,  we  assume  that  the  spectrum 
occupancy  dynamics  { pss/ }  are  known  and  remain  un¬ 
changed  during  the  battery  lifetime  of  the  secondary  user. 

If  channel  a  £  {l,...,iV}  is  chosen  in  this  slot, 
the  secondary  user  will  consume  es  in  sensing  and 
in  transmitting.  Thus,  at  the  end  of  this  slot, 
the  residual  energy  of  the  secondary  user  reduces  to 

E'  =  Te{E  |  a,  0O,  $o(0a)): 

Te(E  |  a,  0a,  $a(0a)) 

E  -  ep,  a  =  0,  (5) 

max{E  —  es  —  3>a(0a)eea, 0},  a  0, 


channel  a  £  { 1 , . . . .  A' } .  For  notation  convenience,  we 
define  qi°]  (k)  =  l[fc=0]. 

Reward  Structure  At  the  end  of  each  slot,  the  secondary 
user  obtains  a  non-negative  reward  depending 

on  its  residual  energy  E  at  the  beginning  of  this  slot,  the 
sensing  outcome  0a,  and  the  sensing  and  access  decisions 
(a,  $o(0a)).  Assuming  that  the  number  of  information 
bits  that  can  be  transmitted  over  a  channel  in  one  slot  is 
proportional  to  the  channel  bandwidth,  we  define  imme¬ 
diate  reward  as 

^(o,$„(9a))  a  J  0,  a  =  0, 

B’6“  “  \$a(0a)Bal[B-e.-eeo>O],  «  ¥=  0. 

(7) 

Note  that  no  reward  will  be  accumulated  once  the  battery 
energy  level  drops  below  es  +  e\,  where  E\  is  the  least 
required  transmission  energy.  Hence,  the  total  expected 
accumulated  reward  represents  the  total  expected  number 
of  information  bits  that  can  be  delivered  by  the  secondary 
user  during  its  battery  lifetime. 

Belief  State  At  the  beginning  of  a  slot,  the  secondary 
user  has  the  information  of  its  own  residual  energy  E  but 
not  the  current  spectrum  occupancy  state  S.  Its  knowledge 
of  S  based  on  all  past  decisions  and  observations  can  be 
summarized  by  a  belief  state  A  =  {As}se{o,i}iv  (Small¬ 
wood  and  Sondik,  1971),  where  As  is  the  conditional 
probability  (given  the  decision  and  observation  history) 
that  the  network  state  is  S  =  s  at  the  beginning  of  this 
slot  prior  to  the  state  transition  of  spectrum  occupancy. 


where  ep  is  energy  consumed  in  the  sleeping  mode. 

Observations  Due  to  partial  spectrum  sensing,  the 
secondary  user  does  not  have  full  knowledge  of  the 
spectrum  occupancy  state  in  each  slot.  It,  however,  can 
obtain  the  occupancy  state  of  the  chosen  channel  a  £ 
{1, . . . ,  N}  from  sensing  outcome  (i.e.,  observation)  0a  £ 
{0,1,...,  L}.  Let  qia\k)  be  the  probability  that  the 
secondary  user  observes  0a  =  k  in  the  chosen  channel 
a  given  current  spectrum  occupancy  state  S  =  s.  Under 
perfect  spectrum  sensing,  we  have  that 

q^\k)  =  Pv{ea  =  k\S  =  s} 

l[fc^0]P a(&) 5  if  CL  7^  0,  Sa  =  1,  (6) 

1  [fc=0]  i  if  CL  7^  0,  sa  =  0, 

where  pa(k)  =  Pr{£tx(a)  =  £k}  is  the  probability  that 
the  fading  condition  of  channel  n  requires  the  secondary 
user  to  transmit  at  the  fc-th  power  level,  and  lrxi  is  the  indi¬ 
cator  function:  1  m  =  1  if  x  is  true  and  0  otherwise.  Note 
that  {Po(fc)}fe=i  are  determined  by  the  fading  statistics  of 
channel  a  and  are  independent  of  the  spectrum  occupancy 
state.  From  (6),  we  can  see  that  ^2k=o  ft)  =  1  f°r 
any  spectrum  occupancy  state  s  £  S  and  any  chosen 


At  the  end  of  a  slot,  the  secondary  user  can  update 
the  belief  state  A  for  future  use  based  on  sensing  action 
a  and  sensing  outcome  0a  in  this  slot.  Specifically,  let 
X'  =  T\{X  |  a,  k)  denote  the  updated  belief  state  whose 
element  A{  represents  the  probability  that  the  current 
spectrum  occupancy  state  is  S  =  s  given  belief  state  A 
at  the  beginning  of  this  slot  and  the  observation  0a  =  k 
of  chosen  channel  a  in  the  current  slot.  Applying  Bayes 
rule,  we  obtain  A^  as 

Ag  =  Pr{S  =  s  |  A,  a,  k} 

Es'VPs'.s,  o  =  0>  (8) 

2^s/  As'ps',sI[So=1[fc#0]]  ^  /  n 

\  I  ,  Cl  ^  U, 

s"  Z-/s'  ^s'Ps'iS"  -i-[s^=l[fc^0]] 

where  the  summations  are  taken  over  the  space  {0, 1}^  of 
spectrum  occupancy  state  S.  Note  that  when  the  secondary 
user  turns  to  sleeping  mode  (a  =  0),  no  observation  is 
made  and  the  belief  state  is  updated  according  to  the 
spectrum  occupancy  dynamics  {ps  s/}. 

Unconstrained  POMDP  Formulation  We  have  formu¬ 
lated  the  energy-constrained  OSA  as  a  POMDP  problem. 
A  policy  7 r  of  this  POMDP  is  defined  as  a  sequence  of 
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functions: 

7T=  ■  •  •],  ft*  :  [0,  l]2  X  £r  —>  A, 

where  {a,  <&„}  =  /x« ( X,E)  maps  every  information  state 
(A,  E),  which  consists  of  belief  state  A  £  [0,  l]2  and 
residual  energy  E  £  £r,  at  the  beginning  of  slot  t  to  a 
sensing  decision  a  £  {0,1,..., AT}  and  a  set  of  access 
decisions  3>a  =  [$a(l), . . . ,  <E>a(£)]  S  {0, 1}L. 


The  design  objective  is  to  find  the  optimal  policy  n* 
that  maximizes  the  total  expected  reward: 


7r*  =  argmaxE„ 


Y"  R(o,$0(e0)) 

(s,E),ea 


C t ) 


(9) 


where  Ao  is  the  initial  belief  state  given  by  the  stationary 
distribution  of  spectrum  occupancy.  We  thus  have  an 
unconstrained  POMDP. 


3.2  Optimal  Policy 


need  to  hop  to  the  same  channel  at  the  beginning  of  each 
slot  in  order  to  communicate  (Zhao  et  al.,  2005b).  Here  we 
show  that  the  optimal  sensing  and  access  policy  developed 
in  Section  3.2  ensures  transceiver  synchronization. 

The  protocol  structure  described  in  Section  2.1  en¬ 
sures  that  both  the  transmitter  and  the  receiver  have  the 
same  information  on  the  occupancy  state  and  the  fading 
condition  of  the  sensed  channel  in  each  slot.  Hence,  at 
the  end  of  each  slot,  the  transmitter  and  the  receiver  will 
reach  the  same  updated  belief  state  A  using  (8)  and  the 
same  residual  energy  E  of  the  transmitter  using  (5).  Since 
the  channel  selection  is  determined  by  the  information 
state  (A,  E),  the  transmitter  and  the  receiver  will  hop 
to  the  same  channel  in  the  next  slot,  i.e.,  transceiver 
synchronization  is  maintained. 

4.  OPTIMAL  POLICY  WITH  REDUCED 
COMPLEXITY 


Let  V (A,  E)  be  the  value  function,  which  denotes  the 
maximum  expected  remaining  reward  that  can  be  accrued 
when  the  current  information  state  is  (A ,E).  We  notice 
from  (7)  that  the  value  function  is  given  by  V (A,  E)  = 
0  for  any  information  state  (A ,E)  with  residual  energy 
E  <  es  +  £i.  For  any  other  information  state,  its  value 
function  V(X,E)  is  the  unique  solution  to  the  following 
equation: 

V(\E)=  max  £  u[a)[R^f k)) 

(o,0)eyt^  (10) 

+  V(T\( A  |  a,  k),  Te{E  \  a ,  k,  <£(&)))], 

where  T\(\\a,k)  is  the  updated  belief  state  given  in 
(8),  Te(E  |  a,  k,  <j>(k))  is  the  reduced  battery  energy  given 
in  (5),  and  =  Pr{0a  =  k  j  A}  is  the  probability 
of  observing  Oa  =  k  given  belief  state  A,  which  is 
determined  by  the  spectrum  occupancy  dynamics  and  the 
channel  fading  statistics: 

4Q)  =  As'  Ps,’s  ^a)(fc)-  (n) 

s'e{o,i}N  se{o,i}w 

In  principle,  by  solving  (10),  we  can  obtain  the 
optimal  sensing  and  access  actions  (a*,  4?*)  that  achieve 
the  maximum  expected  reward  V (A,  E)  for  each  possible 
information  state  (A,  E).  We  can  also  obtain  the  maximum 
expected  number  of  information  bits  Vopt  that  can  be 
delivered  by  a  secondary  user  during  its  battery  lifetime 
as  Vopt  =  V(Ao,£o)'  where  Ao  is  the  initial  belief  state. 

3.3  Transceiver  Synchronization 

Transceiver  synchronization  is  a  key  issue  in  distrib¬ 
uted  MAC  design  for  OSA  networks  (Zhao  et  al.,  2005b). 
Specifically,  a  secondary  user  and  its  intended  receiver 


Although  the  value  function  given  in  (10)  can  be 
solved  iteratively,  it  is  computationally  expensive.  In  this 
section,  we  first  identify  the  sources  of  high  complexity 
of  the  optimal  policy  and  then  reduce  the  complexity 
accordingly. 

4.1  Complexity  of  the  Optimal  Policy 

We  measure  the  computational  complexity  of  a  policy 
as  the  number  of  multiplications  required  to  obtain  all 
sensing  and  access  actions  during  the  secondary  user’s 
battery  lifetime  T  when  initial  belief  state  and  battery 
energy  are  given. 

From  (10),  we  notice  that  the  optimal  sensing  and 
access  action  in  the  first  slot  depends  on  the  value 
functions  of  all  possible  information  states  during  the 
battery  lifetime  T.  Hence,  the  computational  complexity 
of  the  optimal  policy  is  determined  by  the  number  of 
multiplications  required  to  calculate  the  value  functions 
of  all  possible  information  states. 

Following  the  complexity  analysis  in  (Djonin  et  al., 
2007),  we  can  calculate  the  number  of  all  possible  infor¬ 
mation  states  (A,  E)  during  the  secondary  user’  battery 
lifetime.  Specifically,  noting  from  (8)  that  the  updated  be¬ 
lief  state  is  the  same  under  all  non-zero  sensing  outcomes 
( k  ^  0),  we  can  see  that  each  information  state  (A,  E) 
can  transit  to  at  most  L  +  1  different  information  states 
under  sensing  action  a  /  0  but  only  one  under  sensing 
action  a  =  0.  Hence,  for  fixed  initial  information  state 
(Aq,  £o),  the  number  of  all  possible  information  states  is 
of  order  0((N(L  +  l))7^1),  which  is  exponential  in  the 
battery  lifetime  T  and  polynomial  in  the  number  N  of 
channels.  Moreover,  from  (10)  and  (11),  we  can  see  that  it 
requires  0{Z\A\2N2N {L  +  l))  multiplications  to  calculate 
each  value  function,  where  \A\  is  the  size  of  the  action 
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space,  2n  is  the  dimension  of  the  belief  state,  and  L  +  1 
is  the  number  of  possible  observations.  Therefore,  the 
computational  complexity  of  the  optimal  policy  is  of  order 
0(3|^l|2Ar2iv(L  +  1  )(N{L  +  1))T~1).  We  can  see  that 
the  complexity  is  mainly  caused  by  the  following  three 
factors:  1)  the  number  0((N(L  +  1))T_1)  of  possible 
information  states;  2)  the  size  \A\  of  the  action  space,  and 
3)  the  dimension  2N  of  the  belief  state.  We  will  address 
the  first  factor  in  Section  5.  In  this  section,  we  focus  on 
the  other  two  factors. 


where  lu'0  =  0,uj'a  =  uja(3a  +  (1  -  uja)aa  (a  G  {1, . . . ,  L}) 

is  the  probability  that  channel  a  is  available  in  the 
current  slot  given  uj,  Te(E  |  a,  k,  (f>a{k))  is  the  reduced 
battery  energy  given  in  (5),  and  the  updated  belief  state 
uj  =  [tui, . . . ,  Wat]  =  T\ {u:  |  a,  k)  is  given  by 

{0,  if  a  ^  0,  n  =  a,  k  =  0, 

1,  if  a  ^  0,  n  =  a,  k  ^  0,  (14) 

u)'n,  otherwise. 


4.2  Reduction  of  Action  Space  Size 


Careful  inspection  of  (5),  (7)  and  (10)  reveals  that 
the  quantity  R{“f [k))+V{Tx(X  \  a,  k),TE(E  \  a ,  k,  </j(k))) 
inside  the  square  parenthesis  of  (10)  only  depends  on 
the  fc-th  entry  <j)(k)  of  the  access  decision  <fr  and  is 
independent  of  <j)(i )  ( i  ^  k).  We  can  thus  simplify  (10)  as 


V(X,E) 


max  { \  i 
^ . ">  to 


(o) 


max 
0We{o,i} 


[R 


(a, </>(&)) 
E,k 


V (7a (A  \  a,k),TE(E  \  a,  k,  <f>(k)))]}. 


(12) 


5.  SUBOPTIMAL  ENERGY-CONSTRAINED  OSA 

We  notice  from  (10)  that  the  optimal  sensing  and 
access  decisions  in  a  slot  rely  on  the  value  functions  of  all 
possible  information  states  in  the  remaining  slots,  which 
significantly  increases  the  computational  complexity  of  the 
optimal  policy.  In  this  section,  we  provide  a  suboptimal 
solution  to  energy-constrained  OSA,  which  reduces  the 
number  of  value  functions  used  in  decision-making.  We 
show  that  the  computational  complexity  of  this  suboptimal 
strategy  can  be  traded  off  with  its  performance. 

5.1  The  Greedy  -uj  Approach 


Note  that  the  maximization  in  (12)  is  taken  over  the  space 
with  size  0(2NL)  increasing  linearly  with  the  number  L 
of  power  levels,  while  that  in  (10)  is  taken  over  the  action 
space  A  whose  size  0{N2L)  increases  exponentially  with 
L. 

4.3  Reduction  of  Belief  State  Dimension 

Assume  that  the  spectrum  occupancy  evolves  inde¬ 
pendently  across  channels.  It  has  been  shown  in  (Zhao 
et  al.,  2005a)  that  =  [tui, . . . ,  utjv],  where  ojn  denotes 
the  probability  (conditioned  on  all  previous  decisions  and 
observations)  that  channel  n  is  available  at  the  beginning 
of  a  slot  prior  to  the  state  transition,  is  a  sufficient  statistic 
for  belief  state  A.  Note  that  the  dimension  of  u>  increases 
linearly  0(N )  with  the  number  N  of  channels  while  that 
of  A  increases  exponentially  0(2N). 


Referred  to  as  greed y-u'  approach,  the  proposed  strat¬ 
egy  maximizes  the  total  expected  reward  in  a  time  window 
of  w  slots.  Let  Y^\\,E)  denote  the  maximum  reward 
that  can  be  accumulated  in  a  window  of  w  slots  given 
information  state  (A,  E )  and  sensing  action  a.  We  can 
calculate  Ywa\\,E)  recursively  by 

Yo{a)(\,E)  =  0 

Y^(\,E)  =  max  [4“f fe)) 

+  max  Ylbl1(Tx{X\a,k),TE(E\a,k,<j)(k)))], 

OG{0,1,...,AT } 

(15) 

where  u^\  T\(X\a,  k),  and  Te(E  \  a,  k,  c/j(k))  are  given 
in  (11),  (8),  and  (5),  respectively.  From  (15),  we  can  see 
that  for  any  w,  Ywa\ A,  E)  =  0  if  E  <  es  +  £i. 


Applying  the  belief  state  uj,  we  can  simplify  the  value 
function  given  in  (12).  Specifically,  let  an  =  I’rj.S'',  = 
1 1  Sn  =  0}  denote  the  probability  that  channel  n  transits 
from  0  (busy)  to  1  (idle)  and  (3n  =  Pr{S^  =  1 1  Sn  =  1} 
the  probability  that  channel  n  remains  idle.  Then,  (12) 
reduces  to 


V(uj,E)=  max  {(1  —  uj'a) 

ae{0,l,...,N} 

x  V(T\(uj  |  a,0),TE(E  \  a,  0,0)) 

L  (13) 

+  “a  max  [R{Efk)) 

J  0(fc)e{o,i} 

+  V(TX( uj  |  a,  k),  Te(E  |  a,  k,  4>{k)))}}, 


Given  belief  state  A  and  residual  energy  E  of  the 
secondary  user  at  the  beginning  of  a  slot,  the  grecdy-(/j 
approach  chooses  channel  aw  that  maximizes  the  reward 
obtained  in  the  next  w  slots  to  sense,  i.e., 

aw  =  arg  max  F^a)(A,  E).  (16) 

ae{0,l,...,iv} 

Given  sensing  outcome  k  £  {1, . . . ,  L},  the  access  deci¬ 
sion  <paw(k)  of  the  greedy-ru  approach  is  given  by 

<t>a w(k)  =  arg  max  {REwk,,p) 

06(0,1} 

+  max  Y^’]_1(Tx(X  \  aw,  k),TE(E  \  aw,  k, <j>))]} . 

(17) 
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Since  its  channel  selection  is  determined  by  the  informa¬ 
tion  state  (A,  E),  the  greedy-;./;  approach  ensures  trans¬ 
ceiver  synchronization  as  shown  in  Section  3.3. 

5.2  Complexity  Vs.  Performance 


Fig.  1 .  The  number  of  information  bits  that  can  be  transmitted  by  the 
secondary  user  during  its  battery  lifetime.  N  =  2,  [-61,-82]  =  [1,1], 
[ai,a2]  =  [0.2, 0.6],  [/3i,/32]  =  [0.8, 0.8],  es  =  0.5,  ep  =  0.1, 
L  =  2,  £tx  =  {1,2},  pn{  1)  =  0.8,  p„(2)  =  0.2  for  n  =  1,2. 

We  can  see  from  (16)  and  (17)  that  the  sensing  and 
access  decisions  made  by  the  greedy-/;;  approach  in  a  slot 
only  depend  on  the  value  functions  of  all  possible  infor¬ 
mation  states  in  the  next  w  slots.  Hence,  the  total  number 
of  value  functions  required  to  determine  the  sensing  and 
access  decisions  during  battery  lifetime  T  is  of  order 
0((N(L  +  1))W~1T),  which  is  linear  in  T.  Clearly,  the 
computational  complexity  of  greedy-u;  approach  increases 
with  w. 

In  Fig.  1,  we  compare  the  performance  of  the  greedy- 
w  approach  with  the  optimal  performance  V(Ao,£o)  as 
the  initial  energy  £q  increases.  We  consider  N  =  2 
independently  evolving  channels  with  different  occupancy 
dynamics.  As  the  window  size  w  increases,  the  perfor¬ 
mance  of  the  greedy-;/;  approach  improves.  It  quickly 
approaches  the  optimal  performance  as  w  increases. 

The  above  observations  show  that  the  computational 
complexity  of  the  greedy-;/;  approach  increases  while  its 
performance  loss  as  compared  to  the  optimal  performance 
decreases  as  the  window  size  w  increases.  Hence,  by 
choosing  a  suitable  w,  the  greedy-;/;  approach  can  achieve 
a  desired  tradeoff  between  complexity  and  performance. 

6.  NUMERICAL  EXAMPLES 

Careful  inspection  of  (10)  reveals  that  a  sensing 
and  access  action  (a,  4>)  G  A  affects  the  total  expected 
reward  in  three  ways;  1)  it  acquires  an  immediate  reward 
4affe»  in  this  slot;  2)  it  transforms  the  current  belief 
state  A  to  T\(\,a,k)  which  summarizes  the  information 


of  spectrum  occupancy  up  to  this  slot;  3)  it  causes  a 
reduction  in  battery  energy  from  E  to  Te{E,  a,  k,  </>(&)), 
leading  to  a  shorter  remaining  battery  lifetime.  Hence, 
to  maximize  the  total  expected  reward  during  battery 
lifetime,  the  optimal  sensing  and  access  policy  should 
achieve  a  tradeoff  among  gaining  instantaneous  reward, 
gaining  information  for  future  use,  and  conserving  en¬ 
ergy.  In  this  section,  we  study  the  impact  of  spectrum 
occupancy  dynamics,  channel  fading  statistics,  and  energy 
consumption  characteristics  on  the  optimal  sensing  and 
access  actions. 

To  sense  or  not  to  sense?  The  secondary  user  may 
choose  to  sense  in  order  to  gain  immediate  reward  and 
channel  occupancy  information,  but  not  to  sense  in  or¬ 
der  to  conserve  energy.  Hence,  the  optimal  decision  on 
whether  to  sense  should  strike  a  balance  between  gaining 
reward/information  and  conserving  energy.  In  Fig.  2,  we 
study  the  optimal  sensing  decision  l[Q*^o]  >n  a  particular 
slot  under  different  spectrum  occupancy  dynamics  and 
belief  states. 


Fig.  2.  The  optimal  decision  1[ a*^o]  on  whether  to  sense  under 
different  spectrum  occupancy  dynamics  and  belief  states.  N  =  2, 
[81,  B2]  =  [1, 1],  £0  =  4,  es  =  0.6,  ev  =  0.1,  L  =  2,  £tx  =  {1,  2}, 
pn(  1)  =  Pnl 2)  =  0.5  for  n  =  1,2. 

We  consider  N  =  2  independently  evolving  channels 
with  identical  spectrum  occupancy  dynamics  a\  =  a2  = 
a  and  /3i  =  /?2  =  /3-  We  assume  that  (3  =  1  —  a.  Hence, 
the  stationary  distribution  of  spectrum  occupancy  state 
S  is  given  by  u>i  =  [0.5,  0.5].  Consider  another  belief 
state  u: 2  =  [0,0]  with  which  the  secondary  user  has  full 
information  on  the  spectrum  occupancy  prior  to  the  state 
transition  in  this  slot.  Conditioned  on  the  belief  states 
at  the  beginning  of  this  slot,  the  conditional  probability 
that  channel  n  is  available  can  be  calculated  as  Pr{S'n  = 
1  |  tui}  =  0.5  and  PrjS'n  =  1 1  tu2}  =  a  for  n  =  1,2. 
From  Fig.  2,  we  find  that  the  secondary  user  chooses  not 
to  sense  only  when  the  conditional  probability  Prl^n  = 
1  |  u:  \  that  the  channel  is  available  is  very  small.  We  also 
find  that  the  secondary  user  always  chooses  to  sense  if  the 
belief  state  is  given  by  the  stationary  distribution  u>i  of  the 
spectrum  occupancy  dynamics.  The  reason  behind  this  is 
the  monotonicity  of  the  value  function  V(lj,E)  in  terms 
of  battery  energy  E.  Specifically,  if  the  secondary  user 
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chooses  not  to  sense,  then  its  belief  state  at  the  beginning 
of  the  next  slot  will  remain  uj  i  but  its  battery  energy 
will  be  reduced  by  ep  due  to  energy  consumption  in  the 
sleeping  mode.  The  maximum  total  expected  reward  that 
can  be  obtained  is  thus  given  by  V(uj\,E  —  ep).  Since 
V  (uj  .  E)  increases  with  the  battery  energy  E  for  every 
fixed  uj,  we  have  V(uj\,E)  >  V(uj\,E  —  ep)  and  hence 
the  secondary  user  should  choose  to  sense  whenever  it  has 
a  stationary  belief  state. 


Fig.  3.  The  optimal  access  decision  under  different  sensing  energy 
consumptions  es  and  channel  fading  statistics.  N  =  2,  \B] .  B-f  = 
[1, 1],  £0  =  8,  ep  =  0.1.  L  =  3,  £fx  =  {1,  2,  3}.  In  the  upper  plot, 
pn(  1)  =  0.5,pn(2)  =  0.3,p„(3)  =  0.2  for  n  =  1,2,3.  In  the  lower 
plot,  p„(l)  =  0.3, pn (2)  =  0.3, pn (3)  =  0.4. 

To  access  or  not  to  access?  Without  an  energy 
constraint,  the  secondary  user  should  always  access  the 
channel  that  is  sensed  to  be  available.  However,  under 
an  energy  constraint,  the  access  decision  should  take 
into  account  both  the  energy  consumption  characteristics 
and  the  channel  fading  statistics.  For  example,  when  the 
sensed  channel  is  available  but  has  poor  fading  condition, 
should  the  secondary  user  access  this  channel  to  gain 
immediate  reward  or  wait  for  better  channel  realizations 
to  conserve  energy?  In  Fig.  3,  we  study  the  optimal  access 
decision  <j>*  under  different  sensing  energy  consumptions 
es  and  channel  fading  statistics  {pn(k)}%_ We  find  that 
when  sensing  energy  consumption  es  is  negligible,  the 
secondary  user  should  refrain  from  transmission  under 
poor  channel  conditions  and  wait  for  the  best  channel 
realization.  However,  when  es  is  large,  it  should  always 
grab  the  instantaneous  opportunity  regardless  of  the  fading 
condition  because  the  sensing  energy  consumed  in  waiting 
for  the  best  channel  realization  may  exceed  the  extra 
energy  consumed  in  combating  the  poor  channel  fading. 

The  access  decision  should  also  take  into  account 
the  channel  fading  statistics.  Compare  the  optimal  access 
decisions  in  the  upper  and  the  lower  plots  of  Fig.  3  when 
sensing  energy  is  es  =  0.8.  We  find  that  if  the  probability 
that  the  channel  experiences  deep  fading  is  small  (see  the 
upper  plot),  the  secondary  user  should  avoid  transmitting 
under  poor  channel  realizations  because  the  waiting  time 
for  a  better  channel  realization  is  short  and  hence  the 
energy  wasted  in  waiting  can  still  be  lower  than  the  extra 


energy  needed  to  combat  the  poor  channel  condition.  On 
the  other  hand,  if  the  channel  tends  to  have  poor  fading 
conditions  (see  the  lower  plot),  the  secondary  user  should 
focus  on  gaining  immediate  reward  because  of  the  long 
waiting  time  for  better  channel  realizations. 

CONCLUSIONS 

In  this  paper,  we  obtained  the  optimal  sensing  and 
access  policy  for  energy-constrained  OSA  by  formulating 
the  resulting  problem  as  an  unconstrained  POMDP.  We 
proposed  a  suboptimal  solution,  called  greedy-//;,  whose 
computational  complexity  can  be  systematically  traded 
off  with  its  performance.  Numerical  results  demonstrated 
that  the  optimal  sensing  and  access  decisions  should  take 
into  account  not  only  the  spectrum  occupancy  dynamics 
but  also  the  channel  fading  statistics  and  the  energy 
consumption  characteristics  of  the  secondary  user.  The 
cognitive  MAC  developed  in  this  paper  is  energy-efficient 
and  spectrum  adaptive,  and  should  find  applications  in 
FCS. 

REFERENCES 

DARPA,  2005:  DARPA:  The  Next  Generation  Program, 
http://www.darpa.mil/ato/programs/xg/index.htm. 

DySpan,  2005:  Proceedings  of  the  first  IEEE  Symposium  on  New 
Frontiers  in  Dynamic  Spectrum  Access  Networks,  Balti¬ 
more,  USA. 

CrownCom,  2006:  Proceedings  of  the  first  International  Con¬ 
ference  on  Cognitive  Radio  Oriented  Wireless  Networks  and 
Communications,  Mykonos  Island,  Greece. 

Mangold,  S.,  Z.  Zhong,  K.  Challapali,  and  C.  Chou,  2004:  Spec¬ 
trum  agile  radio:  radio  resource  measurements  for  oppor¬ 
tunistic  spectrum  usage.  Proc.  of  Globecom,  3467-3471. 
Larcher,  A.,  H.  Sun,  M.  van  der  Shaar,  and  Z.  Ding,  2004: 

Decentralized  transmission  strategies  for  delay-sensitive  ap¬ 
plications  over  spectrum  agile  networks.  Proc.  of  Interna¬ 
tional  Packet  Video  Workshop,  Irvine,  CA. 

Papadimitratos,  P,  S.  Sankaranarayanan.  and  A.  Mishra,  2005: 
A  bandwidth  sharing  approach  to  improve  licensed  spec¬ 
trum  utilization.  IEEE  Commun.  Magazine,  43,  10-14. 
Zhao,  Q.,  L.  Tong,  and  A.  Swami,  2005a:  Decentralized  cogni¬ 
tive  MAC  for  dynamic  spectrum  access.  Proc.  of  the  first 
IEEE  Symposium  on  New  Frontiers  in  Dynamic  Spectrum 
Access  Networks  (DySpan). 

Zhao,  Q.,  L.  Tong,  A.  Swami,  and  Y.  Chen,  2006:  Cross¬ 
layer  design  of  opportunistic  spectrum  access  in  the  pres¬ 
ence  of  sensing  error.  Proc.  of  the  40th  Conference  on 
Information  Science  and  Systems  (CISS). 

Zhao,  Q.,  L.  Tong,  and  A.  Swami,  2005b:  A  cross-layer  approach 
to  cognitive  MAC  for  spectrum  agility.  Proc.  of  Asilomar 
Conference  on  Signals,  Systems,  and  Computers. 
Smallwood,  R.  and  E.  Sondik,  1971:  The  optimal  control  of 
partially  observable  Markov  processes  over  a  finite  horizon. 
Operations  Research,  1071-1088. 

Djonin,  D.,  Q.  Zhao,  and  V.  Krishnamurthy,  1971:  Optimality 
and  complexity  of  opportunistic  spectrum  access:  a  partially 
observed  markov  decision  process  approach.  Submitted  to 
IEEE  ICC. 


8 


